Script for cleaning water level data from HOBO U20L-01 water level loggers. Designed for surface water level measurements. QA/QC steps include:

  1. removing bad data
  2. converting stage to water depth
  3. checking results against manual measurements.

Requires HOBO raw data and manual measurement spreadsheet

1. Clean By Site

  1. Open a specific site with opn_concat_psite function. This will open all .csvs within a specified year and transect/site. The resulting dataframe should iclude all columns in .csv (renamed to be short and interpretable) and added columns for instrument serial number and a transect position where a transect is a stream location with its corresponding wells (e.g. lw1 represents data from the left riparian well)

Check that data matches column names

interfiles <- 'formatted_data/stage_pressure'
year <- '2020'
# possible locations: 'dh', 'est_louis', 'fool', 'lexen'
location<- 'fool'
site<- 'fc2'

pstage_raw <- opn_concat_psite(interfiles, year, location, site)

  1. Convert pressure to water level using data from the instrument in the ‘atmospheric’ transect position. Water level (h) is given by h=(P-Pr)/(rho * g) where P is the pressure measured by the sensor in the stream, Pr is the pressure measured by the reference sensor (atmospheric pressure), rho is the density of the water in the stream, and g is the acceleration due to gravity.

The density of water is temperature dependent, however, our stream temperatures have a small range…

##   mintemp maxtemp
## 1   1.872   5.655

where the density will not differ much from one data point to another.

This will generate a dataframe with an added column for ‘depth’ based on an equation using the difference between the atmospheric pressure and the well pressure on for the same date and time.

pstage_depth <- inner_join(filter(pstage_raw, position =="atmospheric"), 
            filter(pstage_raw,position != "atmospheric"), by= c("datetime")) %>%
  mutate(depth_m = (pressure_kPa.y-pressure_kPa.x)/(0.999*9.98)) %>%
  select(ID.y, datetime, pressure_kPa.x, pressure_kPa.y, depth_m, position.y, instrument_no.y, temperature_C.y)

Explore missing timesteps

The check Timesteps function will generate a dataframe of missing timesteps by showing every row of the input df where the difference between datapoint times does not equal the interval between the first two datapoints.

checkTimesteps(pstage_raw)
##                     ID pressure_kPa temperature_C    position instrument_no
## atmospheric.NA      NA           NA            NA        <NA>          <NA>
## atmospheric.38017 6337       71.283        -7.010 atmospheric      20833949
## atmospheric.41101 6851           NA            NA atmospheric      20833949
## atmospheric.41102 6852           NA            NA atmospheric      20833949
## atmospheric.41103 6853           NA            NA atmospheric      20833949
## lw1.NA              NA           NA            NA        <NA>          <NA>
## lw1.38018         6337       77.941         2.837         lw1      20879786
## lw2.NA              NA           NA            NA        <NA>          <NA>
## lw2.38019         6337       72.983         2.943         lw2      20879773
## lw3.NA              NA           NA            NA        <NA>          <NA>
## lw3.38020         6337       70.575         3.367         lw3      20879785
## rw1.NA              NA           NA            NA        <NA>          <NA>
## rw1.38021         6337       74.073         2.088         rw1      20879775
## rw2.NA              NA           NA            NA        <NA>          <NA>
## rw2.38022         6337       73.792         3.049         rw2      20879774
##                              datetime set_interval actual_interval
## atmospheric.NA                   <NA>           NA              NA
## atmospheric.38017 2020-11-01 01:00:00           10      70.0000000
## atmospheric.41101 2020-11-04 14:38:49           10       8.8166667
## atmospheric.41102 2020-11-04 14:38:59           10       0.1666667
## atmospheric.41103 2020-11-04 14:39:05           10       0.1000000
## lw1.NA                           <NA>           NA              NA
## lw1.38018         2020-11-01 01:00:00           10      70.0000000
## lw2.NA                           <NA>           NA              NA
## lw2.38019         2020-11-01 01:00:00           10      70.0000000
## lw3.NA                           <NA>           NA              NA
## lw3.38020         2020-11-01 01:00:00           10      70.0000000
## rw1.NA                           <NA>           NA              NA
## rw1.38021         2020-11-01 01:00:00           10      70.0000000
## rw2.NA                           <NA>           NA              NA
## rw2.38022         2020-11-01 01:00:00           10      70.0000000

Look for large data fluctuations

  1. Plot temperature for quick checks
  2. Plot raw data with flag if raw level changes by more than x%.
threshold <-5
pstage_list <- split(pstage_raw, pstage_raw$position)
temp_list <- lapply(pstage_list, function(x) {
  cbind(x, flag = ifelse(c(0,abs(diff(x$temperature_C)))/x$temperature_C > threshold, 48, 1)) })
tempall <- do.call(rbind, temp_list) 
tempall <-  tempall[c('temperature_C', 'ID', 'datetime', 'position')]

tempall <-tempall %>%
  tibble::rowid_to_column() %>%
  spread(key = position, value = temperature_C)

# The number in temp_list[[]] is indicative of position
tsTemp <- xts(tempall, order.by = as.POSIXct(tempall$datetime))

dygraph(tsTemp) %>% 
            #adds time series launcher
            dyRangeSelector() %>% 
            #adds highlight/fade (controled by alpha) of series and formats circle.
            dyHighlight(highlightCircleSize = 4, 
                        highlightSeriesBackgroundAlpha = 0.2,
                        hideOnMouseOut = TRUE)%>% 
            dyAxis('y',label='Degree C',valueRange = c(-20,20 ))%>%
            dyAxis('y2',label='ID',independentTicks=T)%>%
            dySeries('ID',axis='y2')%>%
            #Assigns legend to follow cursor, also can choose 'always' to always see it
            dyLegend(show = "always")

###compare cleaned water level to manual measurements

#plot difference between measured water depth and manual depth measurement
#stageAdj.man<- left_join(manMeas_fil,stage_adj)%>%
#      mutate(diff = man_wtr_dep_static-wtr_depth)

#ggplot(stageAdj.man, aes(datetime, diff))+
#  geom_point(size=3)+
# theme_minimal()
#plot water level time series with manual measurements as points
#stage_check_plot<- stage_adj%>%
#  left_join(manMeas_fil)%>%
#  ggplot(aes(datetime,wtr_depth))+
#    geom_line()+
#    geom_point(aes(datetime, man_wtr_dep_static),size=3,col='red')

#ggplotly(stage_check_plot)

###save

#stage_final <- stage_adj%>%
 #     dplyr::select(location, site, datetime, wtr_depth,level_flag)
#loc_site<- paste(location,site, sep='_')

#interfiles <- './data/3_cleaned_stage_caprod'
#file_path <- paste(interfiles,location,site, sep='/')
#saveRDS(stage_final, file=paste0('data/cln/wtr_lvl_',loc_site,'.csv'))
#write_csv(stage_final, file=paste0(file_path, '_clean.csv'))